<!DOCTYPE HTML>
<html lang="en">
<head>
<title>StreamDevice: Format Converters</title>
<meta charset="utf-8" />
<link rel="shortcut icon" href="favicon.ico" />
<link rel="stylesheet" type="text/css" href="stream.css" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="author" content="Dirk Zimoch" />
</head>
<body>
<iframe src="nav.html" id="navleft"></iframe>
<h1>Format Converters</h1>

<a name="syntax"></a>
<h2>1. Format Syntax</h2>
<p>
<em>StreamDevice</em> format converters work very similar to the format
converters of the C functions <em>printf()</em> and <em>scanf()</em>.
But <em>StreamDevice</em> provides more different converters and you can
also write your own converters.
Formats are specified in <a href="protocol.html#str">quoted strings</a>
as arguments of <code>out</code> or <code>in</code>
<a href="protocol.html#cmd">commands</a>.
</p>
<p>
A format converter consists of
</p>
<ul>
 <li>The <code>%</code> character</li>
 <li>Optionally a field or record name in <code>()</code></li>
 <li>Optionally flags out of the characters <code>*# +0-?=!</code></li>
 <li>Optionally an integer <em>width</em> field</li>
 <li>Optionally a period character (<code>.</code>) followed
     by an integer <em>precision</em> field (input ony for most formats)</li>
 <li>A conversion character</li>
 <li>Additional information required by some converters</li>
</ul>

<p class="new">
An exception is the sequence <code>%%</code> which stands for a single
literal <code>%</code>.
This has been added for compatibility with the C functions
<em>printf()</em> and <em>scanf()</em>.
It behaves the same as the escaped percent <code>\%</code>.
</p>

<p>
The flags <code>*# +0-</code> work like in the C functions
<em>printf()</em> and <em>scanf()</em>.
The flags <code>?</code>, <code>=</code> and <code>!</code> are extensions.
</p>
<p>
The <code>*</code> flag skips data in input formats.
Input is consumed and parsed, a mismatch is an error, but the read
data is dropped.
This is useful if input contains more than one value.
Example: <code>in "%*f%f";</code> reads the second floating point
number.
</p>
<p>
The <code>#</code> flag may alter the format, depending on the
converter (see below).
</p>
<p>
The '<code> </code>' (space) and <code>+</code> flags usually print a
space or a <code>+</code> sign before positive numbers, where negative
numbers would have a <code>-</code>.
Some converters may redefine the meaning of these flags (see below).
</p>
<p>
The <code>0</code> flag usually says that numbers should be left padded
with <code>0</code> if <em>width</em> is larger than required.
Some converters may redefine the meaning of this flag (see below).
<p>
The <code>-</code> flag usually specifies that output is left justified
if <em>width</em> is larger than required.
Some converters may redefine the meaning of this flag (see below).
</p>
<p>
The <code>?</code> flag makes failing input conversions succeed with
a default zero value (0, 0.0, or "", depending on the format type).
</p>
<p>
The <code>=</code> flag allows to compare input with current values.
It is only allowed in input formats.
Instead of reading a new value from input, the current value is
formatted (like for output) and then compared to the input.
</p>
<p>
The <code>!</code> flag demands that input is exactly <em>width</em>
bytes long (normally <em>width</em> defines the maximum number of
bytes read in many formats).
This feature has been added by Klemen Vodopivec, SNS.
</p>

<h3>Examples:</h3>
<table>
 <tr>
  <td><code>in "%f";</code></td>
  <td>Read a float value</td>
 </tr>
 <tr>
  <td><code>out "%(HOPR)7.4f";</code></td>
  <td>Write the HOPR field as 7 char float with precision 4</td>
 </tr>
 <tr>
  <td><code>out "%#010x";</code></td>
  <td>Write a 0-padded 10 char hex integer using the alternative format (with leading 0x)</td>
 </tr>
 <tr>
  <td><code>in "%[_a-zA-Z0-9]";</code></td>
  <td>Read a string of alphanumerical chars or underscores</td>
 </tr>
 <tr>
  <td><code>in "%*i";</code></td>
  <td>Skip over an integer number</td>
 </tr>
 <tr>
  <td><code>in "%?d";</code></td>
  <td>Read a decimal number or if that fails pretend that value was 0</td>
 </tr>
 <tr>
  <td><code>in "%=.3f";</code></td>
  <td>Assure that the input is equal to the current value formatted as a float with precision 3</td>
 </tr>
 <tr>
  <td><code>in "%!5d";</code></td>
  <td>Expect exactly 5 decimal digits. Fewer digits are considered loss of data and make the format fail.
 </tr>
 <tr>
  <td><code>in "%d%%";</code></td>
  <td>Read a decimal number followed by a % sign</td>
 </tr>
</table>

<a name="types"></a>
<h2>2. Data Types and Record Fields</h2>
<h3>Default fields</h3>
<p>
Every conversion character corresponds to one of the data types DOUBLE,
LONG, ULONG, ENUM, or STRING.
In contrast to to the C functions <em>printf()</em> and <em>scanf()</em>,
it is not required to specify a variable for the conversion.
The variable is typically the <code>VAL</code> or <code>RVAL</code> field
of the record, selected automatically depending on the data type.
Not all data types make sense for all record types.
Refer to the description of <a href="recordtypes.html">supported record
types</a> for details.
</p>
<p>
<em>StreamDevice</em> makes no difference between <code>float</code>
and <code>double</code> nor between <code>short</code>, <code>int</code>
and <code>long</code> values.
Thus, data type modifiers like <code>l</code> or <code>h</code> do not
exist in <em>StreamDevice</em> formats.
</p>

<a name="redirection"></a>
<h3>I/O Redirection to other records or fields</h3>
<p>
To use formats with other than the default fields of a record or even with
fields of other records on the same IOC, use the syntax
<code>%(<var>record</var>.<var>FIEL</var>D)</code>.
If only a field name but no record is given, the active record is assumed.
If only a record name but no field name is given, the <code>VAL</code>
field is assumed.
</p>
<p>
<b>Example 1:</b> <code>out&nbsp;"%(EGU)s";</code> outputs the
<code>EGU</code> field of the active record.
</p>
<p>
<b>Example 2:</b> <code>in&nbsp;"%(<var>otherrecord</var>.RVAL)i";</code>
stores the received integer value in the <code>RVAL</code> field of the
other record and then processes that record.
The other record should probably use <code>DTYP="Raw Soft Channel"</code>
in order to convert <code>RVAL</code> to <code>VAL</code>.
</p>
<p>
<b>Example 3:</b> <code>in&nbsp;"%(<var>otherrecord</var>)f";</code>
stores the received floating point value in the <code>VAL</code> field
of the other record and then processes that record. 
The other record should probably use <code>DTYP="Soft Channel"</code>.
In the unlikely case that the name of the other record is the same as a field
of the active record (e.g. if you name a record "DESC"), then use <code>.VAL</code>
explicitly to refer to the record rather than the field of the active record.
</p>
<p>
This feature is quite useful in the case that one line of input contains more
than one value that need to be stored in multiple records or if one line of
output needs to be contructed from values of multiple records.
In order to avoid using full record names in the protocol file, it is recommended
to pass the name or part of the name (e.g. the device prefix) of the other
record as a <a href="protocol.html#argvar">protocol argument</a>.
In that case the redirection usually looks like this:
<code>in "%(\$1<var>recordpart</var>)f"</code> and the record calls the protocol
like this:
<code>field(INP, "@<var>protocolfile</var> <var>protocol</var>($(PREFIX)) $(PORT)")</code>
using a macro for the prefix part which is then used for <code>\$1</code>.
</p>
<p>
If the other record is passive and the field has the PP
attribute (see
<a href="http://www.aps.anl.gov/asd/controls/epics/EpicsDocumentation/AppDevManuals/RecordRef/Recordref-1.html"
target="ex">Record Reference Manual</a>), the record will be processed.
It is your responsibility that the data type of the record field is
compatible to the the data type of the converter.
STRING formats are compatible with arrays of CHAR or UCHAR.
</p>
<p>
Be aware that using this syntax is by far not as efficient as using the
default field.
At the moment it is not possible to set the other record to an alarm
state if anything fails. It will simply not be processed if the fault
happens before or while handling it and it will already have been
processed if the fault happens later.
</p>

<h3>Pseudo-converters</h3>
<p>
Some formats are not actually converters.
They format data which is not stored in a record field, such as a
<a href="#chksum">checksum</a> or
<a href="#regsub">regular expression substitution</a>.
No data type corresponds to those <em>pseudo-converters</em> and the
<code>%(<em>FIELD</em>)</code> syntax cannot be used.
</p>

<a name="stdd"></a>
<h2>3. Standard DOUBLE Converters (<code>%f</code>, <code>%e</code>,
<code>%E</code>, <code>%g</code>, <code>%G</code>)</h2>
<p>
<b>Output:</b> <code>%f</code> prints fixed point, <code>%e</code> prints
exponential notation and <code>%g</code> prints either fixed point or
exponential depending on the magnitude of the value.
<code>%E</code> and <code>%G</code> use <code>E</code> instead of
<code>e</code> to separate the exponent.
</p>
<p>
With the <code>#</code> flag, output always contains a period character.
</p>
<p>
<b>Input:</b> All these formats are equivalent. Leading whitespaces are skipped.
</p>
<p>
With the <code>#</code> flag additional whitespace between sign and number
is accepted.
</p>
<p>
When a maximum field width is given, leading whitespace only counts to the
field witdth when the space flag is used.
</p>

<a name="stdl"></a>
<h2>4. Standard LONG and ULONG Converters (<code>%d</code>, <code>%i</code>,
<code>%u</code>, <code>%o</code>, <code>%x</code>, <code>%X</code>)</h2>
<p>
<b>Output</b>: <code>%d</code> and <code>%i</code> print signed decimal,
<code>%u</code> unsigned decimal, <code>%o</code> unsigned octal, and
<code>%x</code> or <code>%X</code> unsigned hexadecimal.
<code>%X</code> uses upper case letters.
</p>
<p>
With the <code>#</code> flag, octal values are prefixed with <code>0</code>
and hexadecimal values with <code>0x</code> or <code>0X</code>.
<p>
Unlike printf, <code>%x</code> and <code>%X</code> truncate the
output to the the given width (number of least significant half bytes).
</p>
</p>
<p>
<b>Input:</b> <code>%d</code> matches signed decimal, <code>%u</code> matches
unsigned decimal, <code>%o</code> unsigned octal.
<code>%x</code> and <code>%X</code> both match upper or lower case unsigned
hexadecimal.
Octal and hexadecimal values can optionally be prefixed.
<code>%i</code> matches any integer in decimal, or prefixed octal or
hexadecimal notation.
Leading whitespaces are skipped.
</p>
<p>
With the <code>-</code> negative octal and hexadecimal values are accepted. 
</p>
<p>
With the <code>#</code> flag additional whitespace between sign and number
is accepted.
</p>
<p>
When a maximum field width is given, leading whitespace only counts to the
field witdth when the space flag is used.
</p>

<a name="stds"></a>
<h2>5. Standard STRING Converters (<code>%s</code>, <code>%c</code>)</h2>
<p>
<b>Output:</b> <code>%s</code> prints a string.
If <em>precision</em> is specified, this is the maximum string length.
<code>%c</code> is a LONG format in output, printing one character!
</p>
<p>
<b>Input:</b> <code>%s</code> matches a sequence of non-whitespace characters
and <code>%c</code> matches a sequence of not-null characters.
The maximum string length is given by <em>width</em>.
The default <em>width</em> is infinite for <code>%s</code> and
1 for <code>%c</code>.
Leading whitespaces are skipped with <code>%s</code> except when
the space flag is used but not with <code>%c</code>.
The empty string matches.
</p>
<p>
With the <code>#</code> flag <code>%s</code> matches a sequence of not-null
characters instead of non-whitespace characters.
</p>
<p>
With the <code>0</code> flag <code>%s</code> pads with 0 bytes instead of
spaces.
</p>


<a name="cset"></a>
<h2>6. Standard Charset STRING Converter (<code>%[<em>charset</em>]</code>)</h2>
<p>
This is an input-only format.
It matches a sequence of characters from <em>charset</em>.
If <em>charset</em> starts with <code>^</code>, the format matches
all characters <u>not</u> in <em>charset</em>.
Leading whitespaces are not skipped.
</p>
<p>
Example: <code>%[_a-z]</code> matches a string consisting
entirely of <code>_</code> (underscore) or letters from <code>a</code>
to <code>z</code>.
</p>

<a name="enum"></a>
<h2>7. ENUM Converter (<code>%{<em>string0</em>|<em>string1</em>|...}</code>)</h2>
<p>
This format maps an unsigned integer value on a set of strings.
The value 0 corresponds to <em>string0</em> and so on.
The strings are separated by <code>|</code>.
</p>
<p>
Example: <code>%{OFF|STANDBY|ON}</code> mapps the string <code>OFF</code>
to the value 0, <code>STANDBY</code> to 1 and <code>ON</code> to 2.
</p>
<p>
When using the <code>#</code> flag it is allowed to assign integer values
to the strings using <code>=</code>.
Unassigned strings increment their values by 1 as usual.
</p>
<p>
If one string is the initial substing of another, the substing must come
later to ensure correct matching.
In particular if one string is the emptry string, it must be the last one
because it always matches.
Use <code>#</code> and <code>=</code> to renumber if necessary.
</p>
<p>
Use the assignment <code>=?</code> for the last string to make it the
default value for output formats.
</p>
<p>
Example: <code>%#{neg=-1|stop|pos|fast=10|rewind=-10}</code>.
</p>
<p>
If one of the strings contains <code>|</code> or <code>}</code>
(or <code>=</code>  if the <code>#</code> flag is used)
a <code>\</code> must be used to escape the character.
</p>
<p>
<b>Output:</b> Depending on the value, one of the strings is printed,
or the default if given and no value matches.
</p>
<p>
<b>Input:</b> If any of the strings matches, the value is set accordingly.
</p>

<a name="bin"></a>
<h2>8. Binary LONG or ULONG Converter (<code>%b</code>, <code>%B<em>zo</em></code>)</h2>
<p>
This format prints or scans an unsigned integer represented as a binary
string (one character per bit).
The <code>%b</code> format uses the characters <code>0</code> and
<code>1</code>.
With the <code>%B</code> format, you can choose two other characters
to represent zero and one.
With the <code>#</code> flag, the bit order is changed to <em>little
endian</em>, i.e. least significant bit first.
</p>
<p>
Examples: <code>%B.!</code> or <code>%B\x00\xff</code>.
<code>%B01</code> is equivalent to <code>%b</code>.
</p>
<p>
In output, if <em>width</em> is larger than the number of significant bits,
then the flag <code>0</code> means that the value should be padded with
with the chosen zero character instead of spaces.
If <em>precision</em> is set, it means the number of significant bits.
Otherwise, the highest 1 bit defines the number of significant bits. 
</p>
<p>
In input, leading spaces are skipped.
A maximum of <em>width</em> characters is read.
Conversion stops with the first character that is not the zero or the
one character.
</p>

<a name="raw"></a>
<h2>9. Raw LONG or ULONG Converter (<code>%r</code>)</h2>
<p>
The raw converter does not really "convert".
A signed or unsigned integer value is written or read in the internal
(usually two's complement) representation of the computer.
The normal byte order is <em>big endian</em>, i.e. most significant byte
first.
With the <code>#</code> flag, the byte order is changed to <em>little
endian</em>, i.e. least significant byte first.
With the <code>0</code> flag, the value is unsigned, otherwise signed.
</p>
<p>
In output, the <em>precision</em> (or sizeof(long) whatever is less) least
significant bytes of the value are sign extended or zero extended
(depending on the <code>0</code> flag) to <em>width</em> bytes.
The default for <em>precision</em> is 1. Thus if you do not specify
the <em>precision</em>, only the least significant byte is written!
It is common error to write <code>out "%2r";</code> instead of <code>out "%.2r";</code>.
</p>
<p>
In input, <em>width</em> bytes are read and put into the value.
If <em>width</em> is larger than the size of a <code>long</code>, only
the least significant bytes are used.
If <em>width</em> is smaller than the size of a <code>long</code>,
the value is sign extended or zero extended, depending on the
<code>0</code> flag.
</p>
<p>
Examples: <code>out "%.2r"; in "%02r";</code>
</p>

<a name="rawdouble"></a>
<h2>10. Raw DOUBLE Converter (<code>%R</code>)</h2>
<p>
The raw converter does not really "convert".
A float or double value is written or read in the internal
(maybe IEEE) representation of the computer.
The normal byte order is <em>big endian</em>, i.e. most significant byte
first.
With the <code>#</code> flag, the byte order is changed to <em>little
endian</em>, i.e. least significant byte first.
The <em>width</em> must be 4 (float) or 8 (double). The default is 4.
</p>

<a name="bcd"></a>
<h2>11. Packed BCD (Binary Coded Decimal) LONG or ULONG Converter (<code>%D</code>)</h2>
<p>
Packed BCD is a format where each byte contains two binary coded
decimal digits (<code>0</code> ... <code>9</code>).
Thus a BCD byte is in the range from <code>0x00</code> to <code>0x99</code>.
The normal byte order is <em>big endian</em>, i.e. most significant byte
first.
With the <code>#</code> flag, the byte order is changed to <em>little
endian</em>, i.e. least significant byte first.
The <code>+</code> flag defines that the value is signed, using the
upper half of the most significant byte for the sign.
Otherwise the value is unsigned.
</p>
<p>
In output, <em>precision</em> decimal digits are printed in at least
<em>width</em> output bytes.
Signed negative values have <code>0xF</code> in their most significant half
byte followed by the absolute value.
</p>
<p>
In input, <em>width</em> bytes are read.
If the value is signed, a one in the most significant bit is interpreted as
a negative sign.
Input stops with the first byte (after the sign) that does not represent a
BCD value, i.e. where either the upper or the lower half byte is larger
than 9.
</p>

<a name="chksum"></a>
<h2>12. Checksum Pseudo-Converter (<code>%&lt;<em>checksum</em>&gt;</code>)</h2>
<p>
This is not a normal "converter", because no user data is converted.
Instead, a checksum is calculated from the input or output.
The <em>width</em> field is the byte number from which to start
calculating the checksum.
Default is 0, i.e. the first byte of the input or output of the current
command.
The last byte is <em>precision</em> bytes before the checksum (default 0).
For example in <code>"abcdefg%&lt;xor&gt;"</code> the checksum is calculated
from <code>abcdefg</code>,
but in <code>"abcdefg%2.1&lt;xor&gt;"</code> only from <code>cdef</code>.
</p>
<p>
Normally, multi-byte checksums are in <em>big endian</em> byteorder,
i.e. most significant byte first.
With the <code>#</code> flag, the byte order is changed to <em>little
endian</em>, i.e. least significant byte first.
</p>
<p>
The <code>0</code> flag changes the checksum representation to
hexadecimal ASCII (2 chars per checksum byte).
</p>
<p>
The <code>-</code> flag changes the checksum representation to
"poor man's hex": 0x30 ... 0x3f (2 chars per checksum byte).
</p>
<p>
The <code>+</code> flag changes the checksum representation to
decimal ASCII (formatted with %d).
</p>
<!--
In output, the case of the ASCII checksum matches the case of first
letter of the function name.
E.g. <code>out&nbsp;"123456789%&lt;sum&gt;"</code> writes <code>dd</code>
but <code>out&nbsp;"123456789%&lt;Sum&gt;"</code> writes <code>DD</code>.
In input, case is ignored.
-->
</p>
<p>
In output, the checksum is appended.
</p>
<p>
In input, the next byte or bytes must match the checksum.
</p>

<h3>Implemented checksum functions</h3>
<dl>
 <dt><code>%&lt;sum&gt;</code> or <code>%&lt;sum8&gt;</code></dt>
  <dd>One byte. The sum of all characters modulo 2<sup>8</sup>.</dd>
 <dt><code>%&lt;sum16&gt;</code></dt>
  <dd>Two bytes. The sum of all characters modulo 2<sup>16</sup>.</dd>
 <dt><code>%&lt;sum32&gt;</code></dt>
  <dd>Four bytes. The sum of all characters modulo 2<sup>32</sup>.</dd>
 <dt><code>%&lt;negsum&gt;</code>, <code>%&lt;nsum&gt;</code>, <code>%&lt;-sum&gt;</code>, <code>%&lt;negsum8&gt;</code>, <code>%&lt;nsum8&gt;</code>, or <code>%&lt;-sum8&gt;</code></dt>
  <dd>One byte. The negative of the sum of all characters modulo 2<sup>8</sup>.</dd>
 <dt><code>%&lt;negsum16&gt;</code>, <code>%&lt;nsum16&gt;</code>, or <code>%&lt;-sum16&gt;</code></dt>
  <dd>Two bytes. The negative of the sum of all characters modulo 2<sup>16</sup>.</dd>
 <dt><code>%&lt;negsum32&gt;</code>, <code>%&lt;nsum32&gt;</code>, or <code>%&lt;-sum32&gt;</code></dt>
  <dd>Four bytes. The negative of the sum of all characters modulo 2<sup>32</sup>.</dd>
 <dt><code>%&lt;notsum&gt;</code> or <code>%&lt;~sum&gt;</code></dt>
  <dd>One byte. The bitwise inverse of the sum of all characters modulo 2<sup>8</sup>.</dd>
 <dt><code>%&lt;xor&gt;</code></dt>
  <dd>One byte. All characters xor'ed.</dd>
 <dt><code>%&lt;xor7&gt;</code></dt>
  <dd>One byte. All characters xor'ed &amp; 0x7F.</dd>
 <dt><code>%&lt;crc8&gt;</code></dt>
  <dd>One byte. An often used 8 bit crc checksum
  (poly=0x07, init=0x00, xorout=0x00).</dd>
 <dt><code>%&lt;ccitt8&gt;</code></dt>
  <dd>One byte. The CCITT standard 8 bit crc checksum
  (poly=0x31, init=0x00, xorout=0x00, reflected).</dd>
 <dt><code>%&lt;crc16&gt;</code></dt>
  <dd>Two bytes. An often used 16 bit crc checksum
  (poly=0x8005, init=0x0000, xorout=0x0000).</dd>
 <dt><code>%&lt;crc16r&gt;</code></dt>
  <dd>Two bytes. An often used reflected 16 bit crc checksum
  (poly=0x8005, init=0x0000, xorout=0x0000, reflected).</dd>
 <dt><code>%&lt;modbus&gt;</code></dt>
  <dd>Two bytes. The modbus 16 bit crc checksum
  (poly=0x8005, init=0xffff, xorout=0x0000, reflected)</dd>
 <dt><code>%&lt;ccitt16&gt;</code></dt>
  <dd>Two bytes. The usual (but <a target="ex"
   href="http://srecord.sourceforge.net/crc16-ccitt.html">wrong?</a>)
   implementation of the CCITT standard 16 bit crc checksum
   (poly=0x1021, init=0xFFFF, xorout=0x0000).</dd>
 <dt><code>%&lt;ccitt16a&gt;</code></dt>
  <dd>Two bytes. The unusual (but <a target="ex"
   href="http://srecord.sourceforge.net/crc16-ccitt.html">correct?</a>)
   implementation of the CCITT standard 16 bit crc checksum with augment.
   (poly=0x1021, init=0x1D0F, xorout=0x0000).</dd>
 <dt><code>%&lt;ccitt16x&gt;</code> or <code>%&lt;crc16c&gt;</code> or <code>%&lt;xmodem&gt;</code></dt>
  <dd>Two bytes. The XMODEM checksum.
   (poly=0x1021, init=0x0000, xorout=0x0000).</dd>
 <dt><code>%&lt;crc32&gt;</code></dt>
  <dd>Four bytes. The standard 32 bit crc checksum.
   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0xFFFFFFFF).</dd>
 <dt><code>%&lt;crc32r&gt;</code></dt>
  <dd>Four bytes. The standard reflected 32 bit crc checksum.
   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0xFFFFFFFF, reflected).</dd>
 <dt><code>%&lt;jamcrc&gt;</code></dt>
  <dd>Four bytes. Another reflected 32 bit crc checksum.
   (poly=0x04C11DB7, init=0xFFFFFFFF, xorout=0x00000000, reflected).</dd>
 <dt><code>%&lt;adler32&gt;</code></dt>
  <dd>Four bytes. The Adler32 checksum according to <a target="ex"
   href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a>.</dd>
 <dt><code>%&lt;hexsum8&gt;</code></dt>
  <dd>One byte. The sum of all hex digits. (Other characters are ignored.)</dd>
 <dt><code>%&lt;lrc&gt;</code></dt>
  <dd>One byte. The Longitudinal Redundancy Check according to <a target="ex"
   href="https://en.wikipedia.org/wiki/Longitudinal_redundancy_check">Wikipedia</a>.</dd>
 <dt><code>%&lt;hexlrc&gt;</code></dt>
  <dd>One byte. The LRC for the hex digits. (Other characters are ignored.)</dd>
</dl>

<a name="regex"></a>
<h2>13. Regular Expresion STRING Converter (<code>%/<em>regex</em>/</code>)</h2>
<p>
This input-only format matches <a target="ex"
href="http://www.pcre.org/" >Perl compatible regular expressions (PCRE)</a>.
It is only available if a PCRE library is installed.
</p>
<div class="box">
<p>
If PCRE is not available for your host or cross architecture, download
the sourcecode from <a target="ex" href="https://www.pcre.org/">www.pcre.org</a>
and try my EPICS compatible <a target="ex"
href="http://epics.web.psi.ch/software/streamdevice/pcre/Makefile">Makefile</a>
to compile it like a normal EPICS support module.
The Makefile is known to work with EPICS 3.14.8 and PCRE 7.2.
In your RELEASE file define the variable <code>PCRE</code> so that
it points to the install location of PCRE.
</p>
<p>
If PCRE is already installed on (some of) your systems, you may add
architectures where PCRE can be found in standard include and library
locations to the variable <code>WITH_SYSTEM_PCRE</code>.
If either the header file or the library are in a non-standard place,
set in your RELEASE file the variables <code>PCRE_INCLUDE_<em>arch</em></code>
and/or <code>PCRE_LIB_<em>arch</em></code> for the respective architectures
to the correct directories or set
<code>PCRE_INCLUDE</code> and/or <code>PCRE_LIB</code>
in architecture specific RELEASE.Common.<em>arch</em> files.
</p>
</div>
<p>
If the regular expression is not anchored, i.e. does not start with
<code>^</code>, leading non-matching input is skipped.
To match in multiline mode (across newlines) add <code>(?m)</code>
at the beginning of the pattern.
To match case insensitive, add <code>(?i)</code>.
</p>
<p>
A maximum of <em>width</em> bytes is matched, if specified.
If <em>precision</em> is given, it specifies the sub-expression in <code>()</code>
whose match is returned.
Otherwise the complete match is returned.
In any case, the complete match is consumed from the input buffer.
If the expression contains a <code>/</code> it must be escaped like <code>\/</code>.
</p>
<p>
Example: <code>%.1/&lt;title&gt;(.*)&lt;\/title&gt;/</code> returns
the title of an HTML page, skipps anything before the
<code>&lt;title&gt</code> tag and leaves anything after the
<code>&lt;/title&gt;</code> tag in the input buffer.
</p>
<a name="regsub"></a>
<h2>14. Regular Expresion Substitution Pseudo-Converter (<code>%#/<em>regex</em>/<em>subst</em>/</code>)</h2>
<p>
This is a variant of the previous converter (note the <code>#</code>)
but instead of returning the matching string,
it can be used as a pre-processor for input or
as a post-processor for output.
</p>
<p>
Matches of the <em>regex</em> are replaced by the string <em>subst</em> with all
<code>&</code> in <em>subst</em> replaced with the match itself and all
<code>\1</code> through <code>\9</code> replaced with the match of the corresponding
sub-expression <span class="new"> if such a sub-expression exists.

Occurrences of <code>\U<i>n</i></code>, <code>\L<i>n</i></code>, <code>\u<i>n</i></code>,
or <code>\l<i>n</i></code> with <code><i>n</i></code> being a number <code>0</code>
through <code>9</code> or <code>&</code> are replaced with the corresponding sub-expression
converted to all upper case, all lower case, first letter upper case, or first letter lower
case, respectively.</span>
</p>
<p><span class="new">
Due to limitations of the parser, <code>\1</code> and <code>\x01</code> are the same 
which makes it difficult to use literal bytes with values lower than 10 in <em>subst</em>.
Therefore <code>\0</code> aways means a literal byte (incompatible change from earlier version!)
and <code>\1</code> through <code>\9</code> mean literal bytes if they are larger than
the number of sub-expressions.
</span>

To get a literal <code>&</code> or <code>\</code> or <code>/</code> in the substitution write
<code>\&</code> or <code>\\</code> or <code>\/</code>.
</p>
<p>
If <em>width</em> is specified, it limits the number of characters processed.
If the <code>-</code> flag is used (i.e. <em>width</em> looks like a negative number)
only the last <em>width</em> characters are processed, else the first.
Without <em>width</em> (or 0) all available characters are processed.
</p>
<p>
If <em>precision</em> is specified, it indicates which matches to replace.
With the <code>+</code> flag given, <em>precision</em> is the maximum
number of matches to replace.
Otherwise <em>precision</em> is the index (counting from 1) of the match to replace. 
Without <em>precision</em> (or 0), all matches are replaced.
</p>
<p>
When replacing multiple matches, the next match is searched directly after the currently
replaced string, so that the <em>subst</em> string itself will never be modified recursively.
<span class="new">
However if an empty string is matched, searching advances by 1 character in order to
avoid matching the same empty string again.</span>
</p>
<p>
In input this converter pre-processes data received from the device before
following converters read it.
Converters preceding this one will read unmodified input.
Thus place this converter before those whose input should be pre-processed.
</p>
<p>
In output it post-processes data already formatted by preceding converters
before sending it to the device.
Converters following this one will send their output unmodified.
Thus place this converter after those whose output should be post-processed.
</p>
<p>
Examples:
<div class="indent">
<code>%#+-10.2/ab/X/</code> replaces the string <code>ab</code> with <code>X</code>
maximal 2 times in the last 10 characters.
(<code>abcabcabcabc</code> becomes <code>abcXcXcabc</code>)
</div>
<div class="indent">
<code>%#/\\/\//</code> replaces all <code>\</code> with <code>/</code>
(<code>\dir\file</code> becomes <code>/dir/file</code>)
</div>
<div class="indent">
<code>%#/..\B/&:/</code> inserts <code>:</code> after every second character
which is not at the end of a word.
(<code>0b19353134</code> becomes <code>0b:19:35:31:34</code>)
</div>
<div class="indent">
<code>%#/://</code> removes all <code>:</code> characters.
(<code>0b:19:35:31:34</code> becomes <code>0b19353134</code>)
</div>
<div class="indent">
<code>%#/([^+-])*([+-])/\2\1/</code> moves a postfix sign to the front.
(<code>1.23-</code> becomes <code>-1.23</code>)<br>
</div>
<div class="indent">
<code>%#-2/.*/\U0/</code> converts the previous 2 characters to upper case.
</div>
<a name="mantexp"></a>
<h2>15. MantissaExponent DOUBLE converter (<code>%m</code>)</h2>
<p>
This exotic and experimental format matches numbers in the format
<i>[sign] mantissa sign exponent</i>, e.g <code>+123-4</code> meaning
123e-4 or 0.0123. Mantissa and exponent are decimal integers.
The sign of the mantissa is optional.
Compared to the standard <code>%e</code> format, this format does not
contain the characters <code>.</code> and <code>e</code>.
</p>
<p>
Output formatting is ambigous (e.g. <code>123-4</code> versus
<code>1230-5</code>). I chose the following convention:
Format <em>precision</em> defines number of digits in mantissa.
No leading '0' in mantissa (except for 0.0 of course).
Number of digits in exponent is at least 2.
Format flags <code>+</code>, <code>-</code>, and space are supported in
the usual way (always sign, left justified, space instead of + sign).
Flags <code>#</code> and <code>0</code> are unsupported.
</p>
<a name="timestamp"></a>
<h2>16. Timestamp DOUBLE converter (<code>%T(<em>timeformat</em>)</code>)</h2>
<p>
This format reads or writes timestamps and converts them to a double number.
The value represents the number of seconds since 1970 (the UNIX epoch).
The precision of a double is large enough for microseconds (but not for
nanoseconds). This format is probably used best in combination with a
redirection to the <code>TIME</code> field. In this case, the value is
converted to EPICS timestamps (seconds since 1990 and nanoseconds).
The timestamp format understands the usual converters that the C function
<em>strftime()</em> understands. In addition, fractions of a second can
be specified and the time zone can be set in the format string.
</p>
<p>
Example: <code>%(TIME)T(%d %b %Y %H:%M:%.3S %z)</code> may print something like
<code> 3 Sep 2010 15:45:59 +0200</code>.
</p>
<p>
Fractions of a second can be specified as <code>%.<em>n</em>S</code>
(seconds with <em>n</em> fractional digits), as <code>%0<em>n</em>f</code>
or <code>%<em>n</em>f</code> (<em>n</em> fractional digits) or as
<code>%N</code> (nanoseconds).
In input, <em>n</em> is the maximum number of digits parsed, there may be
actually less digits in the input.
If <em>n</em> is not specified (<code>%.S</code> or <code>%f</code>) it uses
a default value of 6.
</p>
<p>
In input, the time zone can be specified in the format like
<code>%+<em>hhmm</em></code> or <code>%-<em>hhmm</em></code> for cases
where the parsed time stamp does not specify the time zone, where
<em>hhmm</em> is a 4 digit number specifying the offset in hours and minutes.
</p>
<p>
In output, the system function <em>strftime()</em> is used to format the time.
There may be differences in the implementation between operating systems.
</p>
<p>
In input, <em>StreamDevice</em> uses its own implementation because many
systems are missing the <em>strptime()</em> function and additional formats
are supported.
</p>
<p>
Day of the week can be parsed but is ignored because the information is
redundant when used together with day, month and year and more or less
useless otherwise. No check is done for consistency.
</p>
<p>
Because of the complexity of the problem, locales are not supported.
Thus, only the English month names can be used (week day names are
ignored anyway).
</p>

<footer>
<a href="processing.html">Next: Record Processing</a>
Dirk Zimoch, 2018
</footer>
</body>
</html>
